333 research outputs found

    Autofix for backward-fit sidechains: using MolProbity and real-space refinement to put misfits in their place

    Get PDF
    Misfit sidechains in protein crystal structures are a stumbling block in using those structures to direct further scientific inference. Problems due to surface disorder and poor electron density are very difficult to address, but a large class of systematic errors are quite common even in well-ordered regions, resulting in sidechains fit backwards into local density in predictable ways. The MolProbity web site is effective at diagnosing such errors, and can perform reliable automated correction of a few special cases such as 180Β° flips of Asn or Gln sidechain amides, using all-atom contacts and H-bond networks. However, most at-risk residues involve tetrahedral geometry, and their valid correction requires rigorous evaluation of sidechain movement and sometimes backbone shift. The current work extends the benefits of robust automated correction to more sidechain types. The Autofix method identifies candidate systematic, flipped-over errors in Leu, Thr, Val, and Arg using MolProbity quality statistics, proposes a corrected position using real-space refinement with rotamer selection in Coot, and accepts or rejects the correction based on improvement in MolProbity criteria and on Ο‡ angle change. Criteria are chosen conservatively, after examining many individual results, to ensure valid correction. To test this method, Autofix was run and analyzed for 945 representative PDB files and on the 50S ribosomal subunit of file 1YHQ. Over 40% of Leu, Val, and Thr outliers and 15% of Arg outliers were successfully corrected, resulting in a total of 3,679 corrected sidechains, or 4 per structure on average. Summary Sentences: A common class of misfit sidechains in protein crystal structures is due to systematic errors that place the sidechain backwards into the local electron density. A fully automated method called β€œAutofix” identifies such errors for Leu, Val, Thr, and Arg and corrects over one third of them, using MolProbity validation criteria and Coot real-space refinement of rotamers

    Protein Design Using Continuous Rotamers

    Get PDF
    Optimizing amino acid conformation and identity is a central problem in computational protein design. Protein design algorithms must allow realistic protein flexibility to occur during this optimization, or they may fail to find the best sequence with the lowest energy. Most design algorithms implement side-chain flexibility by allowing the side chains to move between a small set of discrete, low-energy states, which we call rigid rotamers. In this work we show that allowing continuous side-chain flexibility (which we call continuous rotamers) greatly improves protein flexibility modeling. We present a large-scale study that compares the sequences and best energy conformations in 69 protein-core redesigns using a rigid-rotamer model versus a continuous-rotamer model. We show that in nearly all of our redesigns the sequence found by the continuous-rotamer model is different and has a lower energy than the one found by the rigid-rotamer model. Moreover, the sequences found by the continuous-rotamer model are more similar to the native sequences. We then show that the seemingly easy solution of sampling more rigid rotamers within the continuous region is not a practical alternative to a continuous-rotamer model: at computationally feasible resolutions, using more rigid rotamers was never better than a continuous-rotamer model and almost always resulted in higher energies. Finally, we present a new protein design algorithm based on the dead-end elimination (DEE) algorithm, which we call iMinDEE, that makes the use of continuous rotamers feasible in larger systems. iMinDEE guarantees finding the optimal answer while pruning the search space with close to the same efficiency of DEE. Availability: Software is available under the Lesser GNU Public License v3. Contact the authors for source code

    California Men's Health Study (CMHS): a multiethnic cohort in a managed care setting

    Get PDF
    BACKGROUND: We established a male, multiethnic cohort primarily to study prostate cancer etiology and secondarily to study the etiologies of other cancer and non-cancer conditions. METHODS/DESIGN: Eligible participants were 45-to-69 year old males who were members of a large, prepaid health plan in California. Participants completed two surveys on-line or on paper in 2002 – 2003. Survey content included demographics; family, medical, and cancer screening history; sexuality and sexual development; lifestyle (diet, physical activity, and smoking); prescription and non-prescription drugs; and herbal supplements. We linked study data with clinical data, including laboratory, hospitalization, and cancer data, from electronic health plan files. We recruited 84,170 participants, approximately 40% from minority populations and over 5,000 who identified themselves as other than heterosexual. We observed a wide range of education (53% completed less than college) and income. PSA testing rates (75% overall) were highest among black participants. Body mass index (BMI) (median 27.2) was highest for blacks and Latinos and lowest for Asians, and showed 80.6% agreement with BMI from clinical data sources. The sensitivity and specificity can be assessed by comparing self-reported data, such as PSA testing, diabetes, and history of cancer, to health plan data. We anticipate that nearly 1,500 prostate cancer diagnoses will occur within five years of cohort inception. DISCUSSION: A wide variety of epidemiologic, health services, and outcomes research utilizing a rich array of electronic, biological, and clinical resources is possible within this multiethnic cohort. The California Men's Health Study and other cohorts nested within comprehensive health delivery systems can make important contributions in the area of men's health

    Rational Design of Temperature-Sensitive Alleles Using Computational Structure Prediction

    Get PDF
    Temperature-sensitive (ts) mutations are mutations that exhibit a mutant phenotype at high or low temperatures and a wild-type phenotype at normal temperature. Temperature-sensitive mutants are valuable tools for geneticists, particularly in the study of essential genes. However, finding ts mutations typically relies on generating and screening many thousands of mutations, which is an expensive and labor-intensive process. Here we describe an in silico method that uses Rosetta and machine learning techniques to predict a highly accurate β€œtop 5” list of ts mutations given the structure of a protein of interest. Rosetta is a protein structure prediction and design code, used here to model and score how proteins accommodate point mutations with side-chain and backbone movements. We show that integrating Rosetta relax-derived features with sequence-based features results in accurate temperature-sensitive mutation predictions

    Computational Design of a PDZ Domain Peptide Inhibitor that Rescues CFTR Activity

    Get PDF
    The cystic fibrosis transmembrane conductance regulator (CFTR) is an epithelial chloride channel mutated in patients with cystic fibrosis (CF). The most prevalent CFTR mutation, Ξ”F508, blocks folding in the endoplasmic reticulum. Recent work has shown that some Ξ”F508-CFTR channel activity can be recovered by pharmaceutical modulators (β€œpotentiators” and β€œcorrectors”), but Ξ”F508-CFTR can still be rapidly degraded via a lysosomal pathway involving the CFTR-associated ligand (CAL), which binds CFTR via a PDZ interaction domain. We present a study that goes from theory, to new structure-based computational design algorithms, to computational predictions, to biochemical testing and ultimately to epithelial-cell validation of novel, effective CAL PDZ inhibitors (called β€œstabilizers”) that rescue Ξ”F508-CFTR activity. To design the β€œstabilizers”, we extended our structural ensemble-based computational protein redesign algorithm to encompass protein-protein and protein-peptide interactions. The computational predictions achieved high accuracy: all of the top-predicted peptide inhibitors bound well to CAL. Furthermore, when compared to state-of-the-art CAL inhibitors, our design methodology achieved higher affinity and increased binding efficiency. The designed inhibitor with the highest affinity for CAL (kCAL01) binds six-fold more tightly than the previous best hexamer (iCAL35), and 170-fold more tightly than the CFTR C-terminus. We show that kCAL01 has physiological activity and can rescue chloride efflux in CF patient-derived airway epithelial cells. Since stabilizers address a different cellular CF defect from potentiators and correctors, our inhibitors provide an additional therapeutic pathway that can be used in conjunction with current methods

    HAAD: A Quick Algorithm for Accurate Prediction of Hydrogen Atoms in Protein Structures

    Get PDF
    Hydrogen constitutes nearly half of all atoms in proteins and their positions are essential for analyzing hydrogen-bonding interactions and refining atomic-level structures. However, most protein structures determined by experiments or computer prediction lack hydrogen coordinates. We present a new algorithm, HAAD, to predict the positions of hydrogen atoms based on the positions of heavy atoms. The algorithm is built on the basic rules of orbital hybridization followed by the optimization of steric repulsion and electrostatic interactions. We tested the algorithm using three independent data sets: ultra-high-resolution X-ray structures, structures determined by neutron diffraction, and NOE proton-proton distances. Compared with the widely used programs CHARMM and REDUCE, HAAD has a significantly higher accuracy, with the average RMSD of the predicted hydrogen atoms to the X-ray and neutron diffraction structures decreased by 26% and 11%, respectively. Furthermore, hydrogen atoms placed by HAAD have more matches with the NOE restraints and fewer clashes with heavy atoms. The average CPU cost by HAAD is 18 and 8 times lower than that of CHARMM and REDUCE, respectively. The significant advantage of HAAD in both the accuracy and the speed of the hydrogen additions should make HAAD a useful tool for the detailed study of protein structure and function. Both an executable and the source code of HAAD are freely available at http://zhang.bioinformatics.ku.edu/HAAD

    Delineation of VEGF-regulated genes and functions in the cervix of pregnant rodents by DNA microarray analysis

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>VEGF-regulated genes in the cervices of pregnant and non-pregnant rodents (rats and mice) were delineated by DNA microarray and Real Time PCR, after locally altering levels of or action of VEGF using VEGF agents, namely siRNA, VEGF receptor antagonist and mouse VEGF recombinant protein.</p> <p>Methods</p> <p>Tissues were analyzed by genome-wide DNA microarray analysis, Real-time and gel-based PCR, and SEM, to decipher VEGF function during cervical remodeling. Data were analyzed by EASE score (microarray) and ANOVA (Real Time PCR) followed by Scheffe's <it>F</it>-test for multiple comparisons.</p> <p>Results</p> <p>Of the 30,000 genes analyzed, about 4,200 genes were altered in expression by VEGF, i.e., expression of about 2,400 and 1,700 genes were down- and up-regulated, respectively. Based on EASE score, i.e., grouping of genes according to their biological process, cell component and molecular functions, a number of vascular- and non-vascular-related processes were found to be regulated by VEGF in the cervix, including immune response (including inflammatory), cell proliferation, protein kinase activity, and cell adhesion molecule activity. Of interest, mRNA levels of a select group of genes, known to or with potential to influence cervical remodeling were altered. For example, real time PCR analysis showed that levels of VCAM-1, a key molecule in leukocyte recruitment, endothelial adhesion, and subsequent trans-endothelial migration, were elevated about 10 folds by VEGF. Further, VEGF agents also altered mRNA levels of decorin, which is involved in cervical collagen fibrillogenesis, and expression of eNO, PLC and PKC mRNA, critical downstream mediators of VEGF. Of note, we show that VEGF may regulate cervical epithelial proliferation, as revealed by SEM.</p> <p>Conclusion</p> <p>These data are important in that they shed new insights in VEGF's possible roles and mechanisms in cervical events near-term, including cervical remodeling.</p

    Investigation of Atomic Level Patterns in Proteinβ€”Small Ligand Interactions

    Get PDF
    BACKGROUND: Shape complementarity and non-covalent interactions are believed to drive protein-ligand interaction. To date protein-protein, protein-DNA, and protein-RNA interactions were systematically investigated, which is in contrast to interactions with small ligands. We investigate the role of covalent and non-covalent bonds in protein-small ligand interactions using a comprehensive dataset of 2,320 complexes. METHODOLOGY AND PRINCIPAL FINDINGS: We show that protein-ligand interactions are governed by different forces for different ligand types, i.e., protein-organic compound interactions are governed by hydrogen bonds, van der Waals contacts, and covalent bonds; protein-metal ion interactions are dominated by electrostatic force and coordination bonds; protein-anion interactions are established with electrostatic force, hydrogen bonds, and van der Waals contacts; and protein-inorganic cluster interactions are driven by coordination bonds. We extracted several frequently occurring atomic-level patterns concerning these interactions. For instance, 73% of investigated covalent bonds were summarized with just three patterns in which bonds are formed between thiol of Cys and carbon or sulfur atoms of ligands, and nitrogen of Lys and carbon of ligands. Similar patterns were found for the coordination bonds. Hydrogen bonds occur in 67% of protein-organic compound complexes and 66% of them are formed between NH- group of protein residues and oxygen atom of ligands. We quantify relative abundance of specific interaction types and discuss their characteristic features. The extracted protein-organic compound patterns are shown to complement and improve a geometric approach for prediction of binding sites. CONCLUSIONS AND SIGNIFICANCE: We show that for a given type (group) of ligands and type of the interaction force, majority of protein-ligand interactions are repetitive and could be summarized with several simple atomic-level patterns. We summarize and analyze 10 frequently occurring interaction patterns that cover 56% of all considered complexes and we show a practical application for the patterns that concerns interactions with organic compounds

    Comparison study on k-word statistical measures for protein: From sequence to 'sequence space'

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Many proposed statistical measures can efficiently compare protein sequence to further infer protein structure, function and evolutionary information. They share the same idea of using <it>k</it>-word frequencies of protein sequences. Given a protein sequence, the information on its related protein sequences hasn't been used for protein sequence comparison until now. This paper proposed a scheme to construct protein 'sequence space' which was associated with protein sequences related to the given protein, and the performances of statistical measures were compared when they explored the information on protein 'sequence space' or not. This paper also presented two statistical measures for protein: <it>gre.k </it>(generalized relative entropy) and <it>gsm.k </it>(gapped similarity measure).</p> <p>Results</p> <p>We tested statistical measures based on protein 'sequence space' or not with three data sets. This not only offers the systematic and quantitative experimental assessment of these statistical measures, but also naturally complements the available comparison of statistical measures based on protein sequence. Moreover, we compared our statistical measures with alignment-based measures and the existing statistical measures. The experiments were grouped into two sets. The first one, performed via ROC (Receiver Operating Curve) analysis, aims at assessing the intrinsic ability of the statistical measures to discriminate and classify protein sequences. The second set of the experiments aims at assessing how well our measure does in phylogenetic analysis. Based on the experiments, several conclusions can be drawn and, from them, novel valuable guidelines for the use of protein 'sequence space' and statistical measures were obtained.</p> <p>Conclusion</p> <p>Alignment-based measures have a clear advantage when the data is high redundant. The more efficient statistical measure is the novel <it>gsm.k </it>introduced by this article, the <it>cos.k </it>followed. When the data becomes less redundant, <it>gre.k </it>proposed by us achieves a better performance, but all the other measures perform poorly on classification tasks. Almost all the statistical measures achieve improvement by exploring the information on 'sequence space' as word's length increases, especially for less redundant data. The reasonable results of phylogenetic analysis confirm that <it>Gdis.k </it>based on 'sequence space' is a reliable measure for phylogenetic analysis. In summary, our quantitative analysis verifies that exploring the information on 'sequence space' is a promising way to improve the abilities of statistical measures for protein comparison.</p
    • …
    corecore